Limitations of visual speech recognition
نویسندگان
چکیده
In this paper we investigate the limits of automated lip-reading systems and we consider the improvement that could be gained were additional information from other (non-visible) speech articulators available to the recogniser. Hidden Markov model (HMM) speech recognisers are trained using electromagnetic articulography (EMA) data drawn from the MOCHA-TIMIT data set. Articulatory information is systematically withheld from the recogniser and the performance is tested and compared with that of a typical state of the art lip-reading system. We find that, as expected, the performance of the recogniser degrades as articulatory information is lost, and that a typical lip-reading system achieves a level of performance similar to an EMAbased recogniser that uses information from only the front of the tongue forwards. Our results show that there is significant information in the articulator positions towards the back of the mouth that could be exploited were it available, but even this is insufficient to achieve the same level of performance as can be achieved by an acoustic speech recogniser.
منابع مشابه
به کارگیری سامانه تبدیل گفتار به متن در حوزه مراقبت سلامت: مزایا، محدودیتها، راهکارها
Background and Aim: The applicability of any technology to enter a certain field is determined by defining the advantages and disadvantages of the system in that field. The aim of this study is to show the advantages and limitations of using speech recognition systems in health care and providing practical solutions to improve the acceptability of the system in that field. Materials and M...
متن کاملCorrelation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants
Background: Variability in speech performance is a major concern for children with cochlear implants (CIs). Spectral resolution is an important acoustic component in speech perception. Considerable variability and limitations of spectral resolution in children with CIs may lead to individual differences in speech performance. The aim of this study was to assess the correlation between auditory ...
متن کاملReal-time audio-visual voice activity detection for speech recognition in noisy environments
Voice activity detection (VAD) is one of the most critical issues on performance degradation of speech recognition in noisy environment applications. A real-time VAD was developed by using face parameters (eye and lip contours) as a front-end for the traditional speech and noise (audio) GMMbased method. Speech recognition performance of the audiovisual VAD is shown to be comparable with audio-o...
متن کاملP65: Speech Recognition Based on Bbrain Signals by the Quantum Support Vector Machine for Inflammatory Patient ALS
People communicate with each other by exchanging verbal and visual expressions. However, paralyzed patients with various neurological diseases such as amyotrophic lateral sclerosis and cerebral ischemia have difficulties in daily communications because they cannot control their body voluntarily. In this context, brain-computer interface (BCI) has been studied as a tool of communication for thes...
متن کاملA Comparison of Visual Features for Audio-Visual Automatic Speech Recognition
The use of visual information from speaker’s mouth region has shown to improve the performance of Automatic Speech Recognition (ASR) systems. This is particularly useful in presence of noise, which even in moderate form severely degrades the speech recognition performance of systems using only audio information. Various sets of features extracted from speaker’s mouth region have been used to im...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کامل